-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added bottleneck for nan calculations #306
Conversation
I would categorize bottleneck as an optional dependency within a new category called Feel free to add the relevant entry here: https://github.com/neurodata/treeple/blob/main/pyproject.toml In addition, can you add a CHANGELOG entry? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #306 +/- ##
==========================================
+ Coverage 78.53% 78.66% +0.13%
==========================================
Files 24 24
Lines 2250 2264 +14
Branches 413 417 +4
==========================================
+ Hits 1767 1781 +14
Misses 352 352
Partials 131 131 ☔ View full report in Codecov by Sentry. |
Do you know how other packages test optional dependencies? It would be great to add a unit-test that runs the stuff w/ and w/o bottleneck to assert the answer is the same. |
EDIT: Sorry I didn't notice your comment above. Ya I wanted to add a test too. Right now bottleneck only touches two things, @adam2392 Would you mind taking a look again. I made a couple changes to make it testable, added a test, and changed an existing test. I am failing codecov, but I'm not sure where, the link from codecov isn't showing me the file. I am sure it's me. |
Interesting, ok thanks. I'll fix it. |
@adam2392 I fixed coverage locally. However, by design, bottleneck isn't included in the build because it's optional. I can add it to the build or test requirements files and that would force it into the CI process. Do you have a preference for which one? |
Yes in the CI that tests coverage, we should add the extra group of installs. |
@adam2392 I got it added to the build step by overriding pip install in spin. It looks like the mac builds for python 3.9/10 aren't happy about something, but the others are ok. I don't have a mac to test it. Do you know have you seen this behavior before? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the changes made in spin commands. In the GH actions workflow for build_and_test_slow add a pip install for 'extras' where the installation occurs.
Pip install .[extras]
build_and_test_slow uses spin for the install too right? Would installing the package via pip not create other issues rather than using spin? |
Would you mind rerunning those actions? If it doesn't go away, then maybe if there is an issue with bottleneck, that we need to know about |
No for some unfortunate reasons, we use pip install from a requirements file: treeple/.github/workflows/main.yml Lines 180 to 186 in 1e62fe3
So I would just add a Note |
Now that I think about it, you can also add bottleneck to the |
@adam2392 would you mind taking a look when you have the chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! @ryanhausen LGTM once you fix the minor comment.
doc/whats_new/v0.8.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're on v0.10 now so you can just move this diff to that file
@adam2392 sorry I missed your comment I merged with latest from main and updated the version in the docs. Looks like there was a weird error with one of the Mac builds. Doesn't seem related to the code though. |
Cool. Thanks for the PR @ryanhausen! Can you take a look at what the differential is when using jovo's suggestion? I.e. pre-allocate less nans than needed |
Yep! |
Reference Issues/PRs
None
What does this implement/fix? Explain
Changes nan related calculations in treeple.stats.utils to use bottleneck rather numpy. In my testing, this seems to offer a significant performance speed up, see attached doc:
profiling.pdf
I am also attached the cProfile outputs for the benchmarking in the above pdf.
profiles.zip
Any other comments?
I am opening as a draft as I am not sure if you want to make bottleneck a required dependency or optional. And if optional, how you want to categorize it.
Looking forward to your feedback!